Discovery of Diagnostic Patterns from Protein Sequence Databases
نویسندگان
چکیده
We show how prior domain knowledge can be used in a system for mining databases of biological data. Our system performs automated discovery of diagnostic patterns from a database of protein sequences. Such patterns are used for classiication of new sequences, and identiication of biologically interesting positions in the proteins. The patterns have a simple syntax and can be translated into regular expressions , which can be used for rapid scanning of databases. Current pattern libraries are built semi-manually, since the correctness of the pattern depends on the incorporation of domain knowledge. Due to the dramatic growth of the databases it is desirable to automate this process. Our results show that the patterns derived by our fully automated system compete well with the semi-manually constructed patterns.
منابع مشابه
Sequence-Structure Patterns: Discovery and Applications
Protein sequence data is being generated at a tremendous rate; however, functional annotation of these proteins is proceeding at a much slower pace. Biologists rely on computational biology and pattern recognition to predict the functionality of proteins. This is based on the fact that proteins that share a similar function often exhibit conserved sequence patterns. Such sequence patterns, or m...
متن کاملUse of Peptide library screening to detect a previously unknown linear diagnostic epitope: proof of principle by use of lyme disease sera.
Diagnostic peptides previously isolated from phage-displayed libraries by affinity selection with serum antibodies from patients with Lyme disease were found to give reproducible serum reactivity patterns when tested in two different enzyme-linked immunosorbent assay formats. In addition, the hypothetical possibility that peptides selected by this type of "epitope discovery" technique might ide...
متن کاملProtein Databases
Proteins are sources of many peptides with diverse biological activity. Some of them are considered as valuable components of foods and drug targets with desired and designed biological activity. We are now entering an era rich in biological data in which the field of bioinformatics is poised to exploit this information in increasingly powerful ways. There are currently many databases all over ...
متن کاملData Mining and Knowledge Discovery in Molecular Databases - Session Introduction
The development and growth of molecular databases over the last decade has brought a growing problem to the biocomputing community. Our ability t o analyze, summarize and extract information from these databases has lagged far behind our ability to collect and store data. As well, traditional methods for handling data either automated or manual cannot be eeectively applied because of the volume...
متن کاملiProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations
PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998